Associative Memristive Memory for Approximate Computing in GPUs
نویسندگان
چکیده
Associative memory, in the form of lookup tables, is a promising approach to improving energy efficiency by enabling computing-with-memory. A processing element can be tightly coupled with an associative memory where function responses are pre-stored. Associative memories can recall function responses for a subset of input values therefore avoiding the actual function execution on the processing element that leads to energy saving. One challenge however is to reduce the energy consumption of associative memory modules themselves. In this paper, we address the challenge of designing ultra-lowpower associative memories. We first use memristive parts for memory implementation and demonstrate the energy saving potential of integrating associative memristive memory (AMM) into graphics processing units (GPUs). Next, we leverage approximate computing which takes advantage of application-level tolerance to errors, to enable voltage overscaling to further reduce energy consumption of an AMM module. Voltage overscaling deliberately relaxes the searching criteria of an AMM: The AMM module finds stored patterns matching an input search pattern with a Hamming distance of 0, 1, or 2. This controllable inexact matching introduces some errors to the computation, that are tolerable for the target application. The energy consumption is further reduced by employing a purely resistive crossbar architecture for the AMM module. To evaluate our solution, we tightly integrate AMM modules with floating point units (FPUs) in an AMD Southern Islands GPU. Then we run four image processing kernels on an AMM-integrated GPU to evaluate the proposed architecture. Our experimental results show that the use of the AMM modules reduces energy consumption of running these kernels on GPU by, on average, 23%–45%, compared to the baseline GPU without AMM modules. We also show that these image processing kernels can tolerate errors resulting from approximate search operations with an acceptable degradation of image quality, i.e., a PSNR greater than 30dB.
منابع مشابه
Emulating long-term synaptic dynamics with memristive devices
The potential of memristive devices is often seeing in implementing neuromorphic architectures for achieving brain-like computation. However, the designing procedures do not allow for extended manipulation of the material, unlike CMOS technology, the properties of the memristive material should be harnessed in the context of such computation, under the view that biological synapses are memristo...
متن کاملPattern Classification by Memristive Crossbar Circuits with Ex-situ and In-situ Training
The development of artificial neural networks (ANNs) based on emerging non-volatile memory, such as metal oxide memristors, has attracted an increasing interest recently. In the simplest form of such ANNs, the neurons are implemented with conventional (complementary metal-oxide-semiconductor) technology and interconnected by memristors functioning as artificial synapses. We will first introduce...
متن کاملFusion Coherence: Scalable Cache Coherence for Heterogeneous Kilo-Core System
Future heterogeneous systems will integrate CPUs and GPUs on a single chip to achieve high computing performance as well as high throughput. In general, it would discard the current discrete pattern and will build a uniformed shared memory system avoiding explicit data movement among CPUs and GPUs connected by high throughput NoC. We propose a scalable cache coherence solution Fusion Coherence ...
متن کاملResistive Memory for Approximate Program Acceleration
The Internet of Things significantly increases the amount of data generated that strains the processing capability of current computing systems. Approximate computing can accelerate the computation and dramatically reduce the energy consumption with controllable accuracy loss. In this paper, we propose a Resistive Associative Unit, called RAU, which approximates computation alongside processing...
متن کامل8T SRAM Cell as a Multi-bit Dot Product Engine for Beyond von-Neumann Computing
Large scale digital computing almost exclusively relies on the von-Neumann architecture which comprises of separate units for storage and computations. The energy expensive transfer of data from the memory units to the computing cores results in the well-known von-Neumann bottleneck. Various approaches aimed towards bypassing the von-Neumann bottleneck are being extensively explored in the lite...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE J. Emerg. Sel. Topics Circuits Syst.
دوره 6 شماره
صفحات -
تاریخ انتشار 2016